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ABSTRAGT 

We present a strong lensing modeling technique based on versatile basis sets for the lens and source 
planes. Our method uses high performance Monte Garlo algorithms, allows for an adaptive build 
up of complexity and bridges the gap between parametric and pixel based reconstruction methods. 
We apply our method to a HST image of the strong lens system RXJ1131-1231 and show that our 
method finds a reliable solution and is able to detect substructure in the lens and source planes 
simultaneously. Using mock data we show that our method is sensitive to sub-clumps with masses 
four orders of magnitude smaller than the main lens, which corresponds to about IO^Mq, without 
prior knowledge on the position and mass of the sub-clump. The modeling approach is flexible and 
maximises automation to facilitate the analysis of the large number of strong lensing systems expected 
in upcoming wide field surveys. The resulting search for dark sub-clumps in these systems, without 
mass-to-light priors, offers promise for probing physics beyond the standard model in the dark matter 
sector. 

Subject headings: Gravitational lensing, strong lensing, cosmology 


1. INTRODUCTION 

The standard cosmological model is based on the stan¬ 
dard model of particle physics, Einsteins theory of Gen¬ 
eral Relativity, a cosmological constant, cold dark mat¬ 
ter and inflation. The physical origin of the cosmological 
constant, inflation and dark matter remains a mystery to 
date. The predictions of the expansion history of the uni¬ 
verse has been probed with high precision and structure 
formation has been tested from the horizon scale down 
to about IMpc or even below 


pears. This effect is well suited for m any astroph ysical 


and cosmological applications (see e.g. |Tren 2010 


a re¬ 


view focused on galaxy sized lenses). Strong Lensing 


ture in the lens (Koopmans 

|2005t Vegetti & Koopmans 

2009). This technique has 

been successfully applied to 


et al. 2015 Dawson et al. 2013). 




Planck-Collaboration 
I'he smallest scale 


to about 2 X IO^Mq masses have been detected. Sub 
structure also has an effect on th e flux ratios in multi¬ 
ple lensed quasar images (see e.g., Metcalf fc Zhao||2002 


tests come from the Lyman-alpha forest (see e.g., Sel- 


jak et al. 

20061) 

lous quad 

Irupofe 


Kochanek fc Dalall 2004| |Amara et aLpOO*^ IVIetcalf V 
Amara |2012||Xuet al.|2015|| . Anomalous hux ratios have 


and strong and weak lensing in anoma- thus b een reported in the literature. [Takahashi fc Inoue 


smaller scales in the non-linear regime, there are obser¬ 
vational and theoretical challenges in bringing model and 
data in agreement. This problem occurs predominantly 
in the number, phase space densities and density profiles 
when comparing simulations of dark matter substructure 
with observations of luminous satellite galaxies in our 


Milky Way (see e.g 

., Kauffmann et al.|1993l Klypin et al. 

I 999 I Moore et al. 

19991 Boylan-Kolcliin et al. 201l|). A 


dark matter particle may have an effect on structure for¬ 
mation on small scales without having an effect on larger 
scales. Probing the small scale structure formation and 
mass distribution may thus provide information beyond 
the ACDM model. 

Strong lensing is a pow erful probe to test structure for- 


mation on small scales (|Metcalf & Madau 2001 Dalai 

& KochanekI 120021 

Yoo et al. 1120061 |Keeton V IVIous- 

takas|j2009i Mousta 

cas et al. 2009|). Strong lensing is 

the effect caused by 
ground over-density 

the bending of light by massive fore- 
(e.g. galaxy, group or cluster) such 


that multiple images of the same background object ap- 
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(2014) pointed out that the anomalous hux ratios mea¬ 
sured can be accounted by line-of-sight structure and do 
not have to necessarily come from structure within the 
lens. With recent and upcoming large scale surveys new 
area and depth becomes ava i lable to discover strong lens 
systems. Oguri & Marshall (2010) forecasted thousands 
of lensed quasar systems from DES and LSST. These 
datasets will help to constrain the statistical features of 
the small scale structure imprinted in the strong lens¬ 
ing signal. The increasing number of strong lens systems 
will in the future need to be analyzed with automated 
modeling approaches. 

The aim of this paper is to describe a lens modeling 
approach that can be applied to different lens systems 
without adjusting parameter priors by hand and uses all 
the information contained in a image to constrain the 
projected mass density of the lens with a special empha¬ 
sis on substructure. Our model approach is based on 
parameterized basis sets in the source surface brightness 
and lens model. The model framework can handle an 
adaptive complexity in the source and lens models. In 
addition to the basis sets, we show the power of modern 
sampling techniques and we make use of fast computa¬ 
tional methods. 

The paper is structured as follow: In Section we give 
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an overview of existing lens model techniques and show 
how they relate to our modeling approach. In Section 
the source surface brightness and lens potential basis sets 
on which our model relies on are introduced. Section [4] 
describes the model fitting procedure and in particular 
how the source surface brightness reconstruction is done 
and how we deal with the high number of non-linear pa¬ 
rameters in the lens model. We test our fitting procedure 
on mock data and on Hubble data of the lens system 
RXJ1131-1231. In Section we study how well we can 
detect substructure in a lens model without prior infor¬ 
mation on the mass, slope and position. This section is 
followed by a conclusion (Section]^. 

2. OVERVIEW OF LENS MODEL TECHNIQUES 

Galaxy-size strong lenses have been modeled exten¬ 
sively in the literature (see references below in this sec¬ 
tion). The following aspects have to be modeled in a 
strong lens system when comparing data and model on 
the image level: 

• the lens mass model 


• the source surface brightness profile 

• the lens surface brightness profile 

• the point spread function (PSF) 

Depending on the lens system and instrument, one has to 
also model dust extinction, external convergence, micro 
lensing by stars and other aspects. 

Depending on the scientific aim, the main focus is typ¬ 
ically more on the source surface brightness reconstruc¬ 
tion or on the lens mass reconstruction. In both cases one 
can, broadly speaking, divide the modeling techniques in 
two regimes: 

(1) Parametric reconstruction: Using simple and physi¬ 
cally motivated functional forms with a controllab l e num¬ 


ber of parameters 10 ) (e.g., Kochanek 


et al. 1996 Keeton 2QQ1| | Julio et al. ' 2007). A control- 


1991 Kneib 


lable number of parameters implies that one can fully 
explore the parameter space and convergence to the best 
fitting configuration can often be obtained. 

(2) Pixel based reconstruction: This is most often done 
with a grid where each pixel is treated as a free param¬ 
eter. Pixelised source surface brightness invers i ons have 
been proposed by e.g. Wallington et al. (1996); Warren 
& 



. ■ ■ __ 

(|2U1U|). These methods often rely on a regularization of 

the pixel grid when there is not a unique solution. De¬ 
pending on the regularization procedures, priors and the 
pixel siz e, one can come to diffe rent recons tructed sources 
(see e.g. ^uyu et al.|2QQ6 2013). Recently Tagore & Kee¬ 


ton 


( 2Q14| ) did an analysis of statistical and systematic 


uncertainties in pixel-based source reconstructions. Fur¬ 
thermore, these methods are computationally expensive 
as they rely on large matrix inversions. For the lens 


plied by e.g. Blandford et al. ( 

200 1|); Saha & Williams 

(|2004at; Braaac et aL| (2UU5D; 

KoopmansI (I2005D; Ibaha 

et al.IdzUUbD; buyu Blandford 

15DDSI)T 

Jee et al. 

(i2TO7i)r 

Vegetti ^ Koopmans (|2UU9|); buyu et a. 

LZ 

c: 

c: 

Vegetti 




et ahl (120121); Coles et al. (|2014|) and even mesh-free mod- 
|Merten|(|2014p. 

11 ( 


Computational techniques also vary for different mod¬ 
eling approaches. Ray-tracing has generally been used 
to map extended surface brightness from the source to 
the image plane. If significant surface brightness varia¬ 
tions occur on very small scales, such as for quasars due 
to their compact size, simple ray-tracing can lead to nu¬ 
merical inaccuracies. One way to model such systems 
is to approximate quasars as point sources. One then 
solves the lens equation numeri cally for the positi ons in 
the image plane (recently e.g. Suyu et al. 2013). An 
alternative to avoid the point source approximation is 
adaptive mesh refinemen t (e.g., Metcalf fc Amar^|2Q12 


Metcalf & Petkova 2Q14| ) which changes the ray-tracing 


rehnement scale depending on the local spacial variation 
of the source at different image positions. 

In standard ACDM, the self-similarity of dark matter 
indicates that the same amount of complexity as seen 
in galaxy clusters must also be present in galaxy-sized 
strong lens systems. Its effect is much weaker in terms of 
deflection and magnification, but it must still be present. 
Ideally, we want a model that is flexible such that it can 
describe any lens mass and source surface brightness dis¬ 
tribution. For this model we need to be able to explore 
its degeneracies and to converge to the ‘true’ solution to 
extract the information contained in a strong lens sys¬ 
tem. 

One of the aims of our work is to fill the gap between 
the parametric and non-parametric models. We do so by 
choosing basis sets that we treat in a fully parametrized 
form. 


3. CHOICE OF BASIS SETS 

In the following sections we describe our choices for 
basis sets and, in addition, we present how we produce 
mock data given a set of parameter values. 


3.1. Basis for the source 


We make use of shapelets (introduced by Refregier 


20031 |Refregier & Bacon|2003| jMassey & Refregier|2U(J5|) 

in the source surface brightness plane. We impleniented 
th e two-diniension al Cartesian shapelets (Eq. 1 and 18 
Refregier||2003 , or in Appendix [A| of this wo rk). Inde- 


pendent of this work Tagore & Jackson ( |2015 ) proposed 


a different method to use shapelets in the source recon¬ 
struction. Shapelets form a complete orthonormal basis 
for an infinite series. Restricting the shapelet basis to 
order n provides us with a finite basis set that is linked 
to the scales being modeled. If we wish to model a larger 
range of spatial scales in the surface brightness profile, 
we need to use more high order shapelets. The number 
of basis functions m is related to the restricted order n 
by m = (n + l)(n + 2)/2. The shapelet basis functions al¬ 
low us to dynamically adapt to a given problem. We can 
increase the complexity when we need them and reduce 
it when it is not appropriate. Apart from the order n one 
can also set the reference scale (3 of the basis function. 
Minimal and maximal scales (/min, /max) being resolved 
up to order n with reference scale [3 is /min = /d/Vn -h 1 
and /max = I3y/n + 1. The parameter is a user spec¬ 
ified choice. Another choice is the peak position of the 
shapelet center (xo,^o)- For any finite order in n, the 
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choice of the center is crucial for the fitting result. A 
natural choice for (xo,^o) is the center of the light pro¬ 
file of the source galaxy. In that sense (xo,^o) must be 
interpreted as two non-linear parameters. 


3.2. Basis for the lens 

Choosing a realistic basis set for the lens mass dis¬ 
tribution is a challenging task as there are many differ¬ 
ent scales involved, especially when considering low mass 
sub-clumps. These sub-clumps are very small in scale 
but are also very dense. Having a basis set which allows 
a general description of such clumpy halos on different 
scales typically involves a large number of parameters. 
Depending on the sub-clump mass limit being consid¬ 
ered, there are hundreds or even thousands of sub-clumps 
expected. A minimal description requires at least in¬ 
formation about individual positions, masses and con¬ 
centrations. Such a description leads to a degenerate 
and non-unique lens model (e.g. Keeton 2010 Kneib & 
Natarajan 2011). For cluster lenses, the typical masses 
of substructure are several orders of magnitude below the 
total lens mass, but it is possible to give strong priors on 
the location of the substructure, namely at the position 
of the luminous galaxies. For detecting invisible sub¬ 
structure such a prior can not be used. As often called 
‘non-parametric’ or Tree-form’ approach, meaning there 
are more parameters than data constraints (i.e. deliber- 
ate ly under-constrained) was p ropo sed and implemented 
by Saha & Williams (2004b) and Coles et al. (2014). 
Using the catalog level image position information and 
time-delay measurements, there is far less information 
available than parameters to be constrained. One is able 
to draw random realizations of lens models that meet 
all the constraints. Statements about the validity of a 
specific lens model can only be drawn statistically. Do¬ 
ing a comparison on the image level where about 10^ - 
10^ pixels are involved, more information is available to 
constrain the model. 

In our approach we start with a softened p ower-law 
ellipt ical potential (SPEP) (e.g., discussed by [Barkana 


1998). The leasing potential 4> is parameterized as 


2E^ /o2 , 2 \ 


T]^ 


El 


( 1 ) 


where 


pI 


x\ + x\l COS^ Pp 


( 2 ) 


with cos Pp being the axis ratio of the potential, 77 the 
power-law index, Ep the normalization of the potential, 
Sp the smoothing length and Xi ^2 the position rotated 
such that xi is in the direction of the major axis of the 
potential. For an addition al sub-clump, we mod el them 


either as a spherical NFW Navarro et al. (1997) profile 
or a spherical power-law potential (bPPj. For both func¬ 
tions, we set the softening length Sp = const = 0.0001” 
for computational reasons. In that sense the softening is 
virtually zero and is not a free parameter in this work. 

Combining the two functions (SPEP and SPP) we get 
6 -1- 4 = 10 non-linear free and partially degenerated pa¬ 
rameters to be fitted. With this parameterization we 
expect a good overall fit to many different lens systems 
and perhaps to catch the largest substructure within the 


lens, visible or invisible. Such tests are shown in section 

El 

In addition, we include two dimensional Cartesian 
sha pelet s (same functional form as for the source in Sec¬ 
tion 3.1|) in the potential. We choose the scale factor {3 to 
be tne Einstein radius. This allows for perturbations at 
the global scale of the lens that can not be made with an¬ 
other peaked profile. The first derivatives of the potential 
(deflection angle) and second order derivatives (conver¬ 
gence and shear) can be computed analytically and can 
be expressed within the same shapelet basis functions 
(See Appendix [A|), thus enabling fast computations. 


3.3. Basis for the lens light 

Eo r the descrip tion of the lens light, we use Sersic pro¬ 
files ( Sersic|19^ . Depending on the lens galaxy, adding 
multiple Sersic p rofiles can lead to better fits (see e.g. 
Suyn et ar]|2Q13) ). 


3.4. Image making 

Having a parametric description of the source surface 
brightness, a possible point source, the lens potential, the 
lens light and the PSE, an image can be generated in the 
following steps: 


1. Starting in the image plane one evaluates the an¬ 
alytic expression of the deflection angle using grid 
based ray-tracing. The resolution has to be of or¬ 
der (or slightly smaller than) P/^/rl to capture the 
features in the extended source model. 


2. We then compute the point source image in a itera¬ 
tive ray-shooting procedure starting from the local 
minimas of the relative distance to the point source 
of step 1. Corrections for the next proposed ray¬ 
shooting position can be made when considering 
the relative displacement to the point source and 
the second order derivatives of the lens potential. 
The requirement of the precision of the point source 
position in the image plane of about 1/1000 of the 
pixel size can be reached within very few iterations. 

3. Eor the point sources, which appear as PSE’s, we 
normalize the externally estimated PSE to their 
intrinsic brightness and lens magnification. We 
do not lose significant computational speed when 
modeling the PSE further out to the diffraction 
spikes. Eor the extended surface brightness a nu¬ 
merical convolution needs to be made. This can 
be done either at the pixel or sub-pixel level. This 
step is the most expensive computational process 
in the forward image modeling. The process scales 
roughly linearly with the number of pixels or sub¬ 
pixels in the convolution kernel. We use East- 
Eourier-Transforms implemented in a scipy rou¬ 
tine in python. Our default kernel size is 15 x 15 
pixels. 

4. The lens light is added with analytical Sersic pro¬ 
files convolved with the same PSE kernel as the 
extended source surface brightness. 

Eor the modeling, we do not add noise. When simulating 
realistic images, we add a Gaussian background noise 
with mean zero to all pixels and a scaled Poisson noise 
on the signal (pixel by pixel). 
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4. MODEL FITTING 

For the modeling, we have three questions to answer: 


lensed 


convolved 


1. What is the best fit configuration of the model to 
match the data of a specific lens system? We want 
to find the global minima for the value. 

2. What level of complexity is needed to fit the data to 
a certain level? We want to compare consistency 
with the data by analysing the reduced value 
and compare different model configurations with a 
Bayes factor analysis. 

3. How well is the model solution determined by the 
data? We want to sample the parameter space and 
determine confidence intervals. 


As a result, many choices have to be made in the lens 
modeling. More than 10 parameters in the lens model 
with non-linear behavior have to be specified. For a re¬ 
alistic surface brightness description the shapelet order 
n can be higher than 20 which corresponds to 154 ba¬ 
sis’ and their corresponding coefficients. Given this level 
of complexity, even the first question on its own is diffi¬ 
cult to address. Once we have a method for addressing 
the first question, repeating the procedure with differ¬ 
ent choices of complexity and parameterization will pro¬ 
vide an answer to question Question can then be 
answered with a Bayesian inference method such as a 
Markov chain Monte Carlo (MCMC) samp ling. For this 
we us e the software package CosmoHammer dA keret et al. 
2Q13|), wh i ch is b ased on the emcee method m Goodman 
Weare (|2Q1 Q|) 

Mackey et al.j (15 

massive parallelization in the sampling process. In this 
section, we focus on question We will describe in de¬ 
tail the methods and procedures we apply to make the 
algorithm converge to the best fit lens model configura¬ 
tion. Question [2 and are addressed with examples in 
Section [5] and (BT 


and its implementation by Lbreman- 
!l7p013|). The software package allows tor 
lelization i 


4.1. Source surface brightness reconstruction 

In our method we use a weighted linear least square ap¬ 
proach to reconstruct the source surface brightness. This 
is a standard procedure to minimize the quadratic dis¬ 
tance between data and model with weighted error mea¬ 
sures. The estimation of the covariance can also be cal¬ 
culated (see Eqn[^-[^ below). The minimization problem 
has to be linear. L^et y be the data vector of dimension d. 
In our system, it contains all the pixel values of the image 
in the area of interest for a surface brightness reconstruc¬ 
tion. Let W be the weight matrix of dimension dx d. In 
a likelihood interpretation, W is the inverse covariance 
matrix of the data. Assuming the pixel errors are uncor¬ 
related kF is a diagonal matrix. Let f be the parameter 
vector of dimension m. In our case, f is the vector of 
the coefficients of the linear combination of shapelet ba¬ 
sis functions. The number of shapelet basis functions m 
dep ends on the shapelet order n as described in section 
|3.1[ Let X be the linear response matrix of the shapelet 
parameters on the pixel values in the image plane of di¬ 
mension d X m. The product Xf describes a lensed and 
convolved surface brightness on the image plane. X can 
be computed by mapping all m shapelet basis functions 


source plane 


• 

c 

1 

I 

W 



SSSE 


Fig. I.— An illustration of the modeling of the source surface 
brightness with three different shapelet basis functions. Left pan¬ 
els: Shapelet basis function in the source plane. Middle panels: 
Mapped shapelets in the image plane with a SIS lens via ray¬ 
tracing. Right panels: PSF convolved image. From top to bottom: 
Shapelets with (ni , 712 ) = (I, 0), (2,1), (3, 5). 


from the source to the image plane, convolve and resize 
them separately on the pixel scale. The computational 
cost of this procedure is linear in the number of basis 
functions involved and dominates the process for low m. 

Figure illustrates how the shapelet basis functions 
are mapped. The problem of finding the best source 
configuration ^0 given the data y and the weights W can 
be posed as a weighted linear least square problem: 

Co = min5||Wi/2(#-VC)|| (3) 

This equation can be written as 

{X^WX)io = X^Wy (4) 

whose solution is given by 

ia = {X^WX)-'^X^Wy. (5) 

The covariance matrix of is therefore given by 

= {X'^WX)-^. ( 6 ) 


becomes important when marginalizing the proba- 
bility distribution over 

The procedure involves a matrix inversion of dimen¬ 
sion m X m. The computational cost and memory al¬ 
location of this inversion becomes more significant with 
larger m. Moreover, the matrix {X^WX) has to be in¬ 
vertible. If not, this method fails to find a solution and 
regularization is needed. A grid base d regularization was 
introduced by | Warren & Dye (2 003). Gonceptua l ly and 
computationally, the method oT Warren & Dye (2003) 
and the one presented in this paper difter significantly. 
The matrix {X^WX) is a dense matrix where as the 
matrix in grid based regularization can be sparse. A 
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sparse matrix can only be maintained when having a 
small PSF(e.g. 5x5 pixel). We use in our method a 
default PSF kernel of 15 x 15 pixels and a further ex¬ 
tension affects only the FFT-convolution of the lensed 
shapelet basis functions. Our method is well suited to re¬ 
construct also lensed sources in images with larger PSF’s 
than HST images. But the main gain of our method is 
in terms of the number of parameters (i.e. the size of the 
matrix). Well chosen basis sets can allow for a smaller 
number of parameters compared to grid based methods 
significantly. 

In Figure we take a mock image produced with a 
chosen (inch point sources) with maximum shapelet 
order rimax = 40 and added Poisson and Gaussian back¬ 
ground noise to the image. We check the reconstruction 
by computing the relative residuals and their correlation 
function. We see from Figure that the results are al¬ 
most consistent with purely uncorrelated noise. Only 
for very small separations, the correlation is marginally 
smaller than with noise. This effect highly depends on 
the signal-to-noise ratio of the shapelets. Since we know 
the input source surface brightness, we can also check 
its reconstruction. The error in input vs. output in 
the source has features which represent the scales of the 
shapelet basis functions. The relative error of the surface 
brightness is about 10% or less. This reconstruction pro¬ 
cess with Umax = 40 and 15 x 15 pixel convolution kernel 
takes about 4s on a standard personal computer. When 
reducing the number of shapelet coefficients to Umax = 20 
the reconstruction falls below Is. 

The specific reconstruction depends on the noise re¬ 
alization. By repeating the reconstruction 1000 times 
with different noise realization^ we find that the recon¬ 
struction is stable. In Figure we plot the reduced 
distribution of the different realizations. We find a mean 
Xred ~ 1-0015 with a standard deviation of a = 0.009. 


4.2. Convergence techniques 

In the previous section, we showed that we can linearize 
all parameters of the source model given a specific lens 
model and thus we can express it as a linear minimiza¬ 
tion problem. The marginalization of the linear param- 
eters can be made analytically (see Section 4.2.3 below). 
Changes in the lens model however have a non-linear ef¬ 
fect on the image. In that sense, we can marginalize over 
many parameters in our model and are left with about 
10-30 non-linear parameters. To explore this space we 
use a P article Swarm Optimization (PSO)|Kennedy et al. 
(2001) algorithm to find the minimum of the parameter 


space. The algorithm is described in more detail with an 
illustration in Appendix [C) 

Convergence towards the global minimum in parame¬ 
ter space can depend on several factors. It depends on 
the volume of the parameter space, the number of local 
minima and the shape of the cost function around the 
absolute minimum. As we are marginalizing over all the 
source surface brightness parameters, one can have un¬ 
expected behavior of the cost function over the lens pa¬ 
rameters. In the following we describe our convergence 
method which goes beyond simply applying the PSO al¬ 
gorithm and which is important for the performance of 
our method. 


4.2.1. Parameterization 


The sampling in parameter space can be made in any 
parameterization with a bijective transformation to the 
originally described form. The parameterization can 
have a significant impact on the convergence capacity 
and performance of a specific algorithm. If there are 
periodic boundaries in a specific parameterization, some 
algorithms can have difficulties. In our model, this is 
the case for the parameter of the semi-major axis an¬ 
gle of the elliptical lens potential Oq which is defined 
in the range [0,7r). The model can continuously ro¬ 
tate the axis but the parameter space has to jump from 
0 to TT, or vise versa. Mapping Oq and the axis ra¬ 
tio q = cos/3 into ellipticity parameters (61,62) with 
/ : [0, tt) X (0,1] (-1,1) X (-1,1) given as 

f = (rrf 

provides a continuous link between the lensing poten¬ 
tial and the parameter space. Reducing the surface area 
of boundary conditions in the parameter space can also 
reduce the number of local minima at the boundary sur¬ 
face. The fewer local minima there are the better one 
can find the global minimum. Priors on (^, q)^ i.e. based 
on the observed light distribution, must be transformed 
into priors on (61,62) accordingly. In this work we assign 
uniform uninformative priors on (61,62). 

In general, the particular choice of the parameteriza¬ 
tion can be crucial. The smoother a change in parameter 
space reflects a small change in the model output, the 
better a convergence algorithm can deal with the sys¬ 
tem. The fewer constraints and boundary surfaces there 
are in the parameter space, the more general convergence 
algorithm manage to converge to the global minimum. 

4.2.2. Convergenee with additional eonstraints 

In cases where the source galaxy hosts a quasar that 
dominates the luminosity, its lensed positions in the im¬ 
age plane can be determined by the data without knowl¬ 
edge of the lens, source position or the extended sur¬ 
face brightness. The feature in the image is very well 
predicted by the PSF model and dominates the bright¬ 
ness over an extended area in the image. Any proposed 
lens model that predicts the image positions displaced 
from the features in the image will be excluded by the 
data with high significance. The quasar point sources in¬ 
troduce a degeneracy of acceptable solutions within the 
original parameter space. Knowing about this degener¬ 
acy can lead to faster convergence. 

When having N bright point source images, there are 
2N constraints to the system (their positions in the im¬ 
age plane). This reduces the effective dimensionality of 
the parameter space by 2N. Lensing has three symme¬ 
tries imprinted in the positional information: Two trans¬ 
lations and one rotation. These transformations do not 
change the lens model apart from its own transformation. 

In general, we can use any parameterization Oi of an 
originally M-dimensional parameter space to dimension 
n = M — 2N + 3 (with N >= 2) if there exists a bi¬ 
jective transformation (an exact one-to-one mapping of 
the two sets) to the original parameter space with the 
applied constraints. In the case of four bright images of 
a quasar, we determine an (M — 5)-dimensional param¬ 
eter space and solve for the source plane position of the 
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Fig. 2. — Demonstration of the source surface brightness reconstruction with upper panels showing the image plane and lower ones for 
the source plane. From left to right: Initial mock image (source), reconstructed image (source), relative residuals, ID correlation function 
of residuals. The image is almost perfectly reproduced even without significant residual correlations. The features of the source surface 
brightness profile is very well reproduced. The relative intensities of input vs. output is 10% or below. The spacial correlation of the 
relative difference is enhanced. This feature reffects the properties of the shapelet basis functions involved and the minimal and maximal 
scales of those. 
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variance on the intensity at pixel i as 

^pixel,i ~ ^bkgd,i + /^model,i (8) 

where crbkgd,i is the background noise estimated form the 
image, dmode\,i the model prediction at pixel i and / a 
scaling factor. A pure Poisson noise results in / being the 
product of exposure time and gain. The likelihood of the 
data ddata with A'd image pixels given a model dmodei 
with non-linear lens model parameters 6 can then be 
written as a marginalization over the linear parameters 
the source surface brightness parameters: 

P{dd at a |6i) = J d^P{dd ata|^,0^(0 (9) 


Fig. 3. — The distribution of the reduced values of 1000 real¬ 
izations of the image reconstruction process (see Figure is plot¬ 
ted. Each realization differ in the Gaussian and Poisson noise re¬ 
alization. The mean value of the distribution is Xred ~ 1-0015 and 
the spread is cr = 0.009. 

quasar and five additional lens model parameter with a 
non-linear solver. This reduces the non-linear parameter 
space in the PSO process and leads to faster convergence 
without breaking any degeneracies. The choice of the five 
lens model parameters is arbitrary as long as the param¬ 
eters can provide a solution ot the point source mapping. 
Priors on these parameters have also to be applied in the 
sampling process. 


4.2.3. Likelihood computation 
The likelihood calculation on the image level is the 


product of the likelihoods of each pixel (see e.g. Suyu 


et al. 2013 for a similar approach). We estimate the 


where 


^ d 

P{ddata\d,i) = — expy] 

i=l 

with dmodei = Is the normalization 

Zd = (27r)^<>/2j|api.ei,i (11) 

and P(C) the prior distribution of the shapelet coeffi¬ 
cients. We assume a uniform prior distribution which is 
independent of the lens model. The integral in equation 
can be computed around the maximum Co coming 
from equation (O with covariance matrix from equa¬ 
tion With a second order Taylor expansion around 


(^data,i <^model,i) 
2^pixel,i 
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Fig. 4.— Illustration of a combined PSO and MCMC chain in a 9 dimensional non-linear parameter space. The blue lines connect the 
best fit particle during the PSO process. The red lines mark the true input parameter. Dark (light) gray contours mark the 68%-CL 
(95%-CL) interval estimated from the MCMC process. 


Co, equation (§ can be written as 

^’(C^datal^, ^0 + A^) « P(ddata|0,^o) ' 

^ ( 12 ) 

Integrating equation ( [1^ over results in 

P(rfdata|0) = P(rfdata|0, Co) [(27r)'"det(M^)] ^ (13) 

In principle, equation ( |10| ) is the cost function to use 
for image comparison. The information about the image 
positions is included in this cost function. The problem 
with this cost function is that convergence to a good 


model can be difficult. The use of additional or derived 
information, such as the explicit image positions, can 
facilitate convergence. 

4.2.4. Steps towards convergence 

Having presented our model parameterization in Sec¬ 
tion and discussed certain aspects of model fitting and 
convergence in the previous paragraphs, we describe our 
steps which allows us to find a reasonable fit to the data. 
Figure illustrates the framework. Prior to the con¬ 
vergence algorithm, the image data has to be analyzed, 
the model configuration has to be chosen, the prior val- 
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ues have to be set and the specific configuration of the 
PSO process have to be given as an input. The fitting 
should be done within an automated process where no 
interaction of the modeler is needed. The output of the 
PSO run can then be analyzed by the modeler in terms 
of convergence and quality of fitting. This may lead to 
a change in the model parameters, functions, configura¬ 
tion etc and the process is run again. Once convergence 
is achieved and the fitting result is good, the MCMC pro¬ 
cess is run to map out the valid parameter space given 
the model parameters chosen. Figure illustrates the 
PSO and MCMC process in a 9 dimensional parameter 
space. The thin blue lines corresponds to the PSO pro¬ 
cess. Once this process is converged we start the MCMC 
process around this position (light and dark gray con¬ 
tours). Certain parameters are more degenerate than 
others. We try to map the parameter space such that 
remaining degeneracies are controllable. 


5. EXAMPLE - RXJ1131-1231 

In the following, we test our method on the gravita- 
tional lens RXJ11 31-1231. This lens was discovered by 
Sluse et al. ([2003) and the redshift of the lens zi = 0.295 
and of the background quasar source = 0.658 was 
determined spectroscopically. The le n s was extensively 
model e d by Claeskens e t al.| (2006); Brewer & Lewis 
(|2008|);|lSuyu et al.|(|2013|). We use the skme archival Hbl 
ACb linage in hlt er L'8l4W (GO 9744; PI: Kochanek) as 
Suyu et al. (2013) for our lens modeling and follow a sim- 
ilar procedure tor the reduction process and error estima¬ 
tion. We make use of the MultiDrizzle product from 
the HST archive. The PSF is estimated from stacking of 
nearby stars. We estimate a PSF model error by comput¬ 
ing the variance in each pixel from the different stars after 
a sub-pixel alignment with an interpolation done using 
all the stars. We assume that this model error is propor¬ 
tional to the intensity of the point source. This method is 
meant to demonstrate our method in fitting the best con¬ 
figuration. The lens model is parameterized as a SPEP 
(ellipsoid) and a second SPP (round) profile (see Eqn[^. 
Furthermore we choose 15 shapelet basis sets in the po¬ 
tential and a consta nt external shear c omponent. For the 
lens light we follow Suyu et al. (2013) and use two ellip¬ 
tical Sersic profiles with common centroids and position 
angles to describe the main lens galaxy and a circular 
Sersic profile with Usersic = 1 foi* the small companion 
galaxy. 

Figure shows our result of the fitting process to the 
HST image. In the upper left panel we show the reduced 
data. Upper middle shows the best fit model. On the 
upper right the normalized residuals are plotted. The 
reduced value of this fit is Xred = 1-5 without adjust¬ 
ing any Poisson factors nor the background noise level 
originally derived from the image data products. We 
clearly see that there are significant residuals around the 
point sources which indicates clearly that our PSF model 
needs further improvement and that even our error model 
on the PSF seems to underestimate the model error in 
certain regions. Furthermore, extended regions of over- 
or under-fitting indicate that the lens model can be im¬ 
proved. Source surface brightness adoptions could have 
acted to reduce the error in the fit in case of a perfect lens 
model. The lower left panel shows the reconstructed ex¬ 
tended source surface brightness profile. We clearly see 


the presumably star forming clumps which lead to the 
features in the extended Einstein ring. In the lower mid¬ 
dle panel of Figure our lens model is shown in terms 
of the convergence map. We notice that without mass- 
to-light priors, the position of the two modeled clumps 
is strikingly close to the position of the luminous galaxy 
and its companion. In the lower right panel, the magni¬ 
fication map is shown. The reconstruction of the image 
depends on the number of shapelet coefficients used. In 
Appendix we discuss the effect of Umax on the quality 
of the source and image reconstruction for this particular 
lens system. 

Comparing different lens and source model reconstruc¬ 
tions from different methods is difficult. Different source 
surface brightness reconstruction techniques use differ¬ 
ent number of parameters and thus can have different 
values without changing the lens model. Error models 
and masking do have a significant impact on y^. S etting 


priors may a lso lead to different results (In case of Suyu 


et al. (2013) the position of the sub-clump modeled as 


a singular isothermal sphere was fixed at the position of 
the luminous companion and additional information from 
velocity dispersion measurements). All in all, different 
lens modeling techniques can only be properly compared 
based on mock data. And even on mock data, differ¬ 
ent input types of lens and source models might have a 
significant influence on the relative performance of the 
methods. 


6 . DETECTABILITY OF SUBSTRUCTURE 

One of our main focuses for the model we present is 
to And and quantify substructure within a lens. In this 
section, we want to discuss the following issues: 

1. To what extend are our model basis functions and 
description able to reproduce the true image? 

2. In case of a perfect modeling: Are we able to re¬ 
cover the true parameter configuration in a large 
parameter space? 

3. In case of an imperfect modeling: How does this af¬ 
fect the sensitivity limit, finding and quantification 
of substructures? 

To answer our first question, we refer to our data ex¬ 
ample of RXJ1131-1231 in Section of Figure Even 
though the observed and predicted images can hardly 
be distinguish by eye, the residual map indicates room 
for improvements in our modeling. Nevertheless the fact 
that our mass-to-light prior-free lens model provides ns 
with a realistic solution might indicate that we are not 
far from reality. A priori, we do not know whether the 
solution found in Section is the global minimum of the 
parameter space chosen and therefore the best reachable 
solution within the choices and parameters made. We 
will investigate whether the finder algorithm is able to 
recover the true input parameters when fitting mock im¬ 
ages in the next section. 

6.1. Substructure finding 

To approach question above we take a mock image 
that is highly inspired by RXJ1131-1231 of section We 
keep the image quality fixed (i.e. noise levels, pixd size 
and PSF) but change the lens model such that we have 
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Fig. 5.— Chart of the framework highlighting user interactions. Human interactions are needed for some tasks (green) and decisions 
(blue). Automated tasks are shown in gray. The core of this framework is to clearly split the preprocessing from the fitting algorithm. 




Fig. 6 .— Modeling of RXJ1131-1231 HST ACS F814W image. Upper left: Observed image. Upper middle: Best fit pre-construction. 
Upper right: normalised residuals of the reconstruction. Lower left: Reconstructed source with 1326 shapelet coefficients (up to order 50). 
Lower middle: Convergence model of the lens. Lower right: Magnification model of the lens. 


one big SPEP profile and a minor sub-clump, a spheri¬ 
cal power law potential (SPP). Ideally, we do not want 
to set any prior on the position, mass, shape and num¬ 
ber of substructures. If we were interested in luminous 
sub-structure we could add mass-to-light priors. As we 
want to use our method to potentially detect dark sub¬ 
structure, we are not allowed to give any mass-to-light 
prior. Therefore we want to check whether our algorithm 
finds the preferential parameter space in the model. The 
main focus is on the position of the sub-clump. To ex¬ 


plore our capability of finding sub-clumps, we generate 
mock data with a sub-clump in the lens model at a ran¬ 
dom position. We add Poisson and Gaussian noise on 
the mock image. We then run the convergence method 
on that image with the same weak prior information as 
was done for the real image in section We repeat this 
procedure 10 times. Our result is: 

• Success rate in position of 100%. For our setting 
with a random sampling of the prior parameter 
space, all the runs ended around the right solution 
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(PSO). 

• Detectability down to 10“^ level of the total lens 
mass in the arc of the Einstein ring (MCMC). 

• Time for convergence of about 10^ evaluations of a 
model configuration needed. One evaluation takes 
few seconds. 

For one realization of the input-output process the com¬ 
parison is shown in Figure in terms of convergence and 
magnification and their residuals. We clearly see that the 
position of the sub-clump can be well recovered and the 
appearance of the critical line do match very well. This 
means that there is no other degenerate solution within 
the parameter space that can reproduce a similar feature 
like a sub-clump no matter what combination of source 
surface profile and lens model we chose. This test shows 
that with a ideal model we can find a single sub-clump 
in the lens mass without any prior on its existence and 
position. We also highlight that the relative likelihood 
comparison is large (more than 5 sigma compared with 
the best fit model without a sub-clump). This statement 
in this form can only be made if other effects (such as er¬ 
ror in the PSF model) do not interfere. As we showed for 
the real image in sectionerrors in the PSF model alone 
can potentially lead to aJiigher increase in the minimal 
This test together with the finding of a sub-clump 
in RXJl 131-1231, in both cases without setting priors 
on position, mass-to-light, concentration and mass, are 
encouraging hints that our model approach can extract 
valuable information about a lens system in a rather un¬ 
biased way. 


6 . 2 . Substructure sensitivity 

The last section discussed the potential to recover sub¬ 
structure. We showed that this is possible for substruc¬ 
ture with mass ratios of Mgub ^ lQ~^Mi ens consistent 
with a direct detection of a sub-clump in [Vegetti et al. 
(2012). This implies, that we are sensitive to this mass 


regime in fitting a mass-concentration relation. The con¬ 
centration of the sub-clump is not exactly matched when 
comparing with the most likely solution of the PSO in 
Figure To effectively see how well we can constrain 
certain parameters in the model from the data, we do 
a full likelihood analysis with a MCMC. The mapping 
of the entire parameter space of this realization with an 
MCMC is illustrated in Figure for exactly the same 
realization as Figure The red lines mark the input pa¬ 
rameters. We see, that the input parameters are always 
within 2 sigma of the output parameter distribution. We 
see that there is a partial degeneracy between mass and 
concentration of the sub-clump (7 (clump) and log Oe 
(clump) in Figure Not surprising, it is difficult to con¬ 
strain the profile m a very small clump which itself is 
close to the detection limit. Even better data than HST, 
such as JWST or ALMA can potentially detect clumps 
down to lower mass levels and also constrain the profiles 
of these small clumps. The sensitivity limit relies mostly 
on three criteria: FWHM of PSF, magnification at the 
position of the sub-clump and source surface brightness 
variation at the lensed position of the sub-clump. For 
any different data qualities, telescopes etc, we are able 
to perform such sensitivity tests. 


7. CONCLUSION 

In this paper, we introduced a new strong lensing mod¬ 
eling framework which is based on versatile basis sets for 
the surface brightness and lens model. We identified the 
following aspects of our framework: 

• Its modular design allows for a step-by-step in¬ 
crease in complexity. We are able to determine 
which part of the modeling needs more complexity 
to reproduce a lens system. 

• It allows for automated or semi-automated fitting 
procedures. An adaptive cost function combined 
with a best fit algorithm, allows it to fit different 
strong lens systems without giving specific priors 
to each one of them. This allows for faster and 
more systematic analyses of large numbers of lens 
systems. 

• It is suitable for a wide range of strong lensing sys¬ 
tems and observing conditions. Our framework can 
be applied to various levels of image quality, differ¬ 
ent type of lens system, sizes of the lens, etc, as 
the convergence algorithm does not rely on strong 
initial priors. 

• It features fast source reconstruction techniques. 
The evaluation of the cost function given a posi¬ 
tion in parameter space including the simulation of 
the image can be achieved within seconds. Further¬ 
more our convergence algorithm allows for massive 
parallelization on a distributed computer architec¬ 
ture. 

We further proposed a way to model strong lens systems 
to extract information about the substructure content 
within the lens. Such investigations can potentially pro¬ 
vide useful constraints on the abundances of low mass 
objects. To learn about the dark matter properties from 
strong lensing, one needs to combine well chosen descrip¬ 
tions for the source light and lens mass, algorithm tech¬ 
niques which can find solution in high dimension param¬ 
eter spaces and a combination of different data sets to 
break degeneracies (multi-band imaging, spectroscopy, 
etc.). A special focus has to be made in choosing the 
right set of basis functions and the algorithmic design of 
the convergence method. Our approach is encouraging as 
it succeeds in recovering substructure in the lens without 
setting mass-to-light priors. 
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Fig. 7. — Lens model input-output comparison for convergence and magnification log(abs(magnification)). Without priors on the size or 
position of the sub-clump of order 10® Mq, the PSO can find the lens configuration. The input-output comparison of the image is illustrated 
in Figure The sensitivity for the sub-clump detection is analyzed in Figure We clearly see that the we are sensitive to the position up 
to 0.1”. 
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APPENDIX 

SHAPELETS 


The two dimensional Cartesian shapelets as described in |Refregier| (2003) are the multiplication of two one¬ 
dimensional shapelets 0(x): 

0n(x) = 0ni (^l)0n2 (^ 2 )• 


The one-dimensional Cartesian shapelet is given by: 


(^) 


2^7r^n! 


" Hn{x)e- 


(A2) 


where n is a non-negative integer and the Hermite polynomial of order n. The dimensional basis function is 
described as 

(t>n{x-, p) = (A3) 


In two dimensions, we write 


B„(x;/3)=/? <)>„(/? x). 


(A4) 


This set of basis functions is used for the s ource surface brightness modeling and the lensing potential. To find the 
derivatives of this functions, Refregier (2003) introduced raising and lowering operators which act on the basis functions 
as 

(A5) 


—1: ^Ti T 


the derivative operator can be written as 


= T (a - 

dx a /2 ^ 


(A6) 


and therefore any derivative ca n be written as a superposition of two other shapelet basis functions (for further 
discussions, see |Refregier| (|2003|)) In Figure [sj it is illustrated how shapelet basis functions in the potential space do 
map in the deflection angle and convergence. 


NUMBER OF SHAPELET COEFFICIENTS 

The choice of the maximal order of the shapelet coefficients Umax and its corresponding number m = (rimax + 1) * 
(umax + 2)/2 has a significant influence in the goodness of fit to imaging data. In Figure we illustrate this by 
reconstructing the source surface brightness with different Umax- We use the same lens model as in Figure We 
see that even with Umax = 10 , most of the features in the arcs of the image could be reconstructed qualitatively but 
significant residuals remain. By increasing Umax, more and more details in the source appear and the residuals go 
down. 
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Fig. 8. — The shapelet functions in potential space are plotted in the first row. From top to bottom: (0,0), (1,0), (0,1), (0,1) + (3,0). 
The second and third rows show the deflection angles ai and 0 : 2 - The last row shows the corresponding convergence k. 

PARTICLE SWARM OPTIMIZATION 


The Particle Swarm Optimization (PSO) description was introduced by Kennedy et ah (2001) as a method to 
find the global minima in a high dimensional non-linear distribution. The algorithm is motivated by the physical 
picture of a swarm of particles moving in a physical potential. Every particle gets assigned a position in parameter 
space, a function evaluation (the log likelihood value) and a velocity in parameter space. The particles is assigned a 
swarm ’’physical” behavior when moving up or downwards a potential and a ’’swarm” behavior when redirecting their 
velocity towards the particle at the deepest place of the potential. The PSO process is illustrated in Figure in a 
20 dimensional paramete r space. The implem entation of the PSO algorithm used in this work is publicl y available as 
part o f the CosmoHammer Akeret et al. (|2Q13[) software pac kage. The inertia weight strategy comes from Bansal et al. 
(2011) and the stopping criteria of Zielinski Laur (2008) was implemented. 
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Fig. 9.— The source surface brightness reconstruction of the lens system RXJ1131-1231 is modeled with different shapelet orders rimax- 
Upper panel: The reconstructed image. Middle panel: The normalized residual maps. Lower panel: The reconstructed source. From left 
to right: Increasing number of shapelet order rimax from 10 to 50. 



Fig. 10.— Illustration of the PSO process in 20 dimensions with 160 particles and 200 iterations. Left panel: Evolution of the log 
likelihood of the best fit particle. Middle panel: The difference of the parameter values from the best fit at each iteration relative to the 
end point of the PSO process. Right panel: Velocity of the best fit particle at each iteration. Different colors are used for each of the 
parameters. 



























